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We investigate a method for extracting nonlinear principal com- 
ponents (NPCs) . These NPCs maximize variation subject to smooth- 
ness and orthogonahty constraints; but we allow for a general class 
of constraints and multivariate probability densities, including densi- 
ties without compact support and even densities with algebraic tails. 
We provide primitive sufficient conditions for the existence of these 
NPCs. By exploiting the theory of continuous-time, reversible Markov 
diffusion processes, we give a different interpretation of these NPCs 
and the smoothness constraints. When the diffusion matrix is used to 
enforce smoothness, the NPCs maximize long-run variation relative 
to the overall variation subject to orthogonality constraints. More- 
over, the NPCs behave as scalar autoregressions with heteroskedastic 
innovations; this supports semiparametric identification and estima- 
tion of a multivariate reversible difi'usion process and tests of the 
overidentifying restrictions implied by such a process from low fre- 
quency data. We also explore implications for stationary, possibly 
non-reversible diffusion processes. Finally, we suggest a sieve method 
to estimate the NPCs from discretely-sampled data. 



1. Introduction. Principal components are functions of the data that 
capture maximal variation in some sense. Often they are restricted to be 
linear functions of the underlying data as in original analyses of Pearson 



271 ] and Hotelling [23!]. In this paper we study the extraction of nonlinear 



principal components (NPCs) using information encoded in the probability 
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density of the data. Formally, the NPCs maximize variation subject to or- 
thogonality and smoothness constraints where smoothness constraints are 
enforced by a quadratic form / expressed in terms of the gradients of func- 
tions. Specifically, the quadratic form is 

^ Jxen 

where V denotes the (weak) gradient operator, is the state space, S is a 
state-dependent positive-definite matrix, and q is the invariant density of a 
strictly stationary ergodic data {xi}f^i. 

Alternatively, NPCs are solutions to approximation problems. Suppose 
we wish to form the best finite-dimensional least squares approximation to 
an infinite-dimensional space of smooth functions, where we use the form 
/ to limit the class of functions to be approximated. In a sense that we 
make formal, a finite number of NPCs solves this problem. More stringent 
smoothness restrictions enforced by penalization limit the family of functions 
to be approximated while improving the overall quality of approximation. 
Thus our analysis of NPCs is in part an investigation of this approximation. 

Previously Box and Tiao [7] proposed a canonical analysis of multivariate 
linear time series. This analysis produces linear principal components of the 
multivariate process that can be ordered from least to most predictable. 



Much later in a seemingly unrelated paper, Salinelli [3l[ | defined NPCs for 
multivariate absolutely continuous random variables and characterized these 
NPCs as eigenfunctions of a self-adjoint, differential operator. As we will 
show these two methods are related. We share Salinelli [31| interest in NPCs, 
but our departure from his work is substantial. For Salinelli, the matrix 
S is the identity matrix, the state space 0, is compact and the density q 
is bounded above and below for the bulk of his analysis. Our interest in 
probability densities q that do not have compact support, including densities 
with algebraic tails, leads us naturally to consider a more general class of 
smoothness penalties. By allowing for a more flexible specification for T, and 
q, we entertain a larger class of smoothness constraints vis a vis Salinelli 



31[ with explicit links to the data generation. Establishing the existence of 



NPCs in our setup is no longer routine. 



Salinelli [3]| assumed that the data generation process is independent 
and identically distributed (IID). While our analysis is applicable to such 
an environment, we also explore the case in which data {xi}f^i is sampled 
in low frequency from a stationary Markov diffusion process. By considering 
such processes, we make a specific choice of the matrix S used to enforce 
smoothness. It is the local covariance or diffusion matrix. With this choice. 
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the NPCs extracted with smoothness penalties are ordered by the ratio of 
their long-run variation to the overall variation as in Box and Tiao 
NPCs that capture variation subject to smoothness constraints also display 
low frequency variation due to their high persistence. In effect we provide 
an extension of the method of Box and Tiao 0] to nonlinear, multivariate 
Markov diffusions, and establish an explicit link to the method of Salinelli 

IS- 

In this paper we do the following: 

1. Formulate the NPCs extraction to include state dependence in the 
smoothness constraint and state spaces that have infinite Lebesgue 
measure. 

2. Give sufficient conditions for the existence of these NPCs. 

3. Provide a reversible Markov diffusion process for the data generation 
that supports the NPCs extraction method and generates testable im- 
plications. 

4. Explore implications for a more general class of Markov diffusion pro- 
cesses. 

The rest of the paper is organized as follows. In section [21 we first de- 
fine NPCs as functions that maximize variation subject to orthogonality 
conditions and smoothness bounds given by the quadratic form /. Section [3] 
presents existence results. In section[4]we suppose the data are sampled from 
a multivariate nonlinear diffusion and establish the connection between our 
NPCs and the canonical analysis of Box and Tiao Q]- The results in section 
[5] relate the NPCs to eigenfunctions of conditional expectations operators 
associated with a stationary Markov process {xt} defined using the diffusion 
matrix S and the stationary density q. Given an eigenfunction ip, the pro- 
cess {^l^{xt)} behaves as a scalar autoregression. Thus the eigenfunctions we 
obtain satisfy testable implications when the data is generated by a Markov 
process. The Markov process constructed in Section [5] is time reversible. In 
Section [6] we characterize other Markov processes associated with the same 
q and S. Section [7] provides a sieve method to estimate these NPCs using 
discrete-time low frequency observations {a^j}^^. Section [8] gives some con- 
cluding remarks and discusses applications of our results. The appendices 
contain computations associated with an example and some proofs that are 
not stated in the main text. 

2. Nonlinear principal components. To define a functional notion 
of principal components we require two quadratic forms. We start with an 
open connected C M". Let g be a probability density on Q with respect to 
Lebesgue measure. The implied probability distribution Q is the population 
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counterpart to the empirical distribution of the data. The data could be IID 



as in Salinelli 3l[, but we are primarily interested in the case in which the 
data {xi}f^i are sampled in low frequency from a continuous-time, station- 
ary Markov diffusion {xt : t > 0}. In this case q is the stationary density of 

Xt- 

Let denote the space of Borel measurable square integrable functions 
with respect to the population probability distribution Q. The inner 
product (denoted < •, • >) is one of the two forms of interest. We use the 
corresponding norm to define an approximation criterion. 

The second form is used to measure smoothness. Consider a (quadratic) 
form fo defined on C^, the space of twice continuously differentiable func- 
tions with compact support in that can be parameterized in terms of the 
density q and a positive definite matrix S that can depend on the state: 



n ^ dyj dvi 



where 



S = [aij]. 



Assumption 2.1. q is a positive, continuously differentiable probability 
density on Q. 

Assumption 2.2. T, is a continuously differentiable, positive definite 
matrix function on 17. 

Assumptions 12.11 and 12.21 restrict the density q and the matrix S to be 
continuously differentiable. These assumptions are made for convenience. As 
argued by Davies (see Theorem 1.2.5) these restrictions can be replaced 
by a less stringent requirement that entries of the matrix qT, are locally (in 
L^(Lebesgue)), weakly differentiable. 

While the fo is constructed in terms of the product qT,, the density q will 
play a distinct role when we consider extending the domain of the form to 
a larger set of functions. 

To study the case in which $7 is not compact, we will consider a particular 
closed extension of the form fo. We extend the form fo to a larger domain 
H C using the notion of a weak derivative. 

H = {^GL^: there exists , measurable, with /^'S,, < oo, 
and J (pVip = - j gtp, for ah Tp £ C}^}. 
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The random vector g is unique (for each (j)) and is referred to as the weak 
derivative of (p. From now on, for each (j) va. H we write V(/) = g. 

Notice that H is constructed exactly as a weighted Sobolev space except 
that instead of requiring that g £ L'^, we require that Ag G where A is 
the square root of E. Also we use C}^ test functions. One can show, using 
mollifiers, that allowing for this larger set of test functions is equivalent to 
using the more usual set of test functions, (see Brezis jl] Remark 1, 
page 150.) For any pair of functions ip and (/) in H we define: 

/(0,^) = ^ / (V</.)'S(VV)g, 

which is an extension of fo- In H we use the inner product < (pjip >j=< 
(p^tp > +f{p, ip). With this inner product, H is complete and hence a Hilbert 
space (see Proposition lA.ll in the Appendix). Thus H is taken to be the 
domain !'(/) of the form /. Notice, in particular, that the unit function is 
in V{f) = H. 

2.1. Initial construction. NPCs maximize variation subject to smooth- 
ness constraints. In our generalization these NPCs are defined as follows. 

Definition 2.1. The function ipj is the j*^ nonlinear principal compo- 
nent (NPC) for j > 1 if ipj solves: 

max < p,p > 

<t>£H 

subject to 

f{(PA) = 1, 

<ipsA> = o,s = 0, j - 1, 

where ipQ is initialized to he the constant function one. 



There are two differences between our proposed extraction and that of 
Salinelli 311]. First, Salinelli 311] assumes that S is the identity matrix. To 
accommodate a richer class of densities, we allow S to be state dependent. 
Second, Salinelli [ill] assumes that the data density q has finite Lebesgue 
measure and is bounded away from zero. We allow the Lebesgue measure 
of the state space to be infinite, and accordingly our density q is no longer 
assumed to be bounded from below. 

NPCs are eigenfunctions of the quadratic forms /. 



6 CHEN, HANSEN AND SCHEINKMAN 

Definition 2.2. An eigenfunction ip of the quadratic form f satisfies: 
(2.2) /(0,^) = <5<0,V> 

for all (j) £ T>{f). The scalar 6 is the corresponding eigenvalue. 

Since / is positive semidefinite, 6 must be nonnegative. The NPCs extracted 
in the manner given in (j2.ip have eigenvalues 5j that increase with j. If we 
renormahze the eigenfunctions to have a unit second moment, the NPCs wih 
be ordered by their smoothness as measured by 6j = f{iljj,^j). Moreover, 
fiipj,ipk) = for j / k. 

Suppose that the NPCs {V'j : j = 0, 1, ...} exist with corresponding eigen- 
values {6j : j = 0,1, ...}. Consider any (j) in L^. Then 



and for any 4>,il^ £ T^{f), 

2.2. Benchmark optimization problem. Let H hea closed linear subspace 
of L^, and consider the optimization problem: 



Problem 2.1. 



subject to 



for some 9 > 0. 



max < (p,(p > 
<f>eH 



<cP,cP>+f{cl),cl))<l 



A necessary condition for ip tohe a NPC is that it satisfies an eigenvalue 
problem: 

Claim 2.1. A solution ip to Problem \2.1\ will also solve the eigenvalue 
problem: 

<(t>,ip>=\[e<ci),,p>+f{ci),,p)] 

for some positive A and all 4> £ H. 
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To establish the existence of a solution to Problem 12 .11 it suffices to sup- 
pose the following: 

Condition 2.1. (existence) {(f) G : f{4>,<P) + < (t>,(t> >< 1} is 

precompact (has compact closure) in . 

The precompactness restriction guarantees that we may extract an con- 
vergent sequence in the constraint set, with objectives that approximate the 
supremum. The limit point of convergent sequence used to approximate the 
supremum, however, will necessarily be in the constraint set because the 
constraint set is convex and the form is closed. 



2.3. Approximation. Why do we care about NPCs? One way to ad- 
dress this is to explore the construction of the best, finite-dimensional, least 
squares approximations. Specifically, suppose we wish to construct the best 
finite dimensional set of approximating functions for the space of functions 
that are square integrable with respect to a probability measure Q with 
density q. We now motivate NPCs as the recursive solution to such a prob- 
lem. The A^-dimensional problem is solved by solving one-dimensional 
problems using a sequence of H^s that remove one dimension in each step. 
The outcome at each step is a NPC used as an additional approximating 
function. 

Initially solve Problem 12.11 for H = L^, select a solution ipQ and denote 
the maximized objective as Aq. Inductively, given ■00, "01) •••) V'i-i) form -ffj-i 
as the j dimensional space generated by these j solutions constructed recur- 
sively. Let Hj-_i denote the space of all elements of that are orthogonal 
to these j solutions and hence orthogonal to -ffj-i- Solve Problem 12.11 for 
H = Hj-_i, select a solution -^j, and form \j as the maximized value. The se- 
quence {Xj : j = 0, 1, ...} is decreasing because we are omitting components 
of the constraint set for the maximization problem as j increases. 

In what sense is such a recursive procedure optimal? In answering this 
question, let Proj{(j)\H) denote the least squares projection of (p onto the 
closed (in L^) linear space H. The second moment of the approximation 
error is: 

< (/> - Proj{(l)\H), (j) - Proj{^\H) >=< ^,cp> -[Proj{<j)\H)f . 
Claim 2.2. Let H denote any N -dimensional subspace of LP' . Then 
max {<(/>,(/)> -\Proii(l)\H)f] > Xn- 
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Our next result shows that the bound deduced in Claim 12.21 is attained 
by Hm-i- 

Claim 2.3. 

Taken together, these two claims justify H^^i as a good A^-dimensional 
space of approximating functions. 

Remark 2.1. There exist N -dimensional spaces other than H^-i that 
attain the hound given in Claim\K^ One reason is that there may be multiple 
solutions to Problem \2.1[ Even when the solution to Prohlem \2.1\ is unique, at 
each stage of the construction ipN-i may be replaced by the sum ofipj^j^i plus 
some ip's that is orthogonal to all of the solutions to Prohlem \2.1\ with H = 
Such a choice cannot necessarily be used in a recursive construction 
of optimal approximating spaces with dimension greater than N . 

2.4. Nonlinear principal components revisited. In Problem l2.lt the con- 
straint set gets larger as 9 declines to zero. Reducing the smoothness penalty 
with a smaller 6 enlarges the collection of functions that satisfy the con- 
straint. Thus the maximized objective increases as is reduced. While this 
is true, it turns out the maximizing choice of (j) does not depend on 9 up to 
scale. This follows because the ranking over (/)'s implied by the ratio: 

<<t),(t)> 



does not depend on the value of 9. The same ranking is also implied by the 
ratio: 

< 0,<^ > 



provided that H is orthogonal to all constant functions. Thus a scaled solu- 
tion ip to Problem 12.11 also solves: 



Problem 2.2. 



max < (p,(p > 

<I>€H 



subject to: 

= 1. 
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Restricting H to be orthogonal to constant functions is equivalent to 
limiting attention to functions (p that have mean zero under the population 
data distribution Q. Recall that our construction of NPCs is based on the 
recursive application of this problem. 

From Claim [2?T] we know that ip satisfies: 

for all (j) ^ H. Rearranging terms, 

/((/>, ■0) = 5 <4>,^> 

where 




This is the eigenvalue associated with the NPC extraction. Solving for A, 

Since eigenvalues 5 of the form increase without bound, the corresponding 
sequence of A's converge to zero guaranteeing that approximation becomes 
arbitrarily accurate as the number of NPCs increases. 

3. Existence. In this section we consider more primitive sufficient con- 
ditions that imply Condition 12.11 which as we noted in section [2l guarantees 
the existence of NPCs. We allow for noncompact state spaces and provide 
alternative restrictions on the tail behavior of the the density q and the 
penalization matrix S that guarantee that the compactness criterion (Con- 
dition [211]) is satisfied. Roughly speaking when the tails of the density q are 
exponentially thin, the compactness criterion can be established without re- 
quiring that the matrix E becomes large (in the sense of positive definite 
matrices) in the tails. On the other hand, when the tails of q are algebraic 
and hence thicker, divergence of S in the tails can play an important role in 
establishing Condition 12.11 

We start by reviewing some known existence conditions, which we ex- 
tend using two devices. First, we transform the function space and hence 
the (quadratic) form so that distribution induced by q is replaced by the 
Lebesgue measure. This transformation allows us to apply known results for 
forms built using the Lebesgue measure. Second, we study forms that are 
simpler but dominated by /. When the dominated forms satisfy Condition 
12.11 the same can be said of /. 
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3.1. Compact Domain. Salinelli 311] established the existence of eigen- 



functions by applying Rellich's compact embedding theorem when the do- 
main Q is compact with a continuous boundary. This approach requires a 
density q that is bounded and bounded away from zero and a derivative 
penalty matrix S that is uniformly nonsingular. 

3.2. Real Line. Perhaps surprisingly, the NPC extraction is nontrivial 
even for densities on the real line. This is because our NPCs can be nonlinear 
functions of the underlying Markov state. We initially consider the case in 
which the state space is the real line. 

Proposition 3.1. Suppose 11 = ?^ and 

roo 2 1 



(3.2) 



lim 

>oo 



X 

\x\ 



+00. 



Then Condition \2.1\ is satisfied. 



When is constant, the compactness condition (|3.2|) reduces to: 

-q'ix)- 



lim — - — - 



]{x) 



+00, 



which rules out densities with algebraic tails (tails that decay slower than 
l^l raised to a negative power.) By allowing for to increase, we can accom- 
modate densities with algebraic tails. We now extend this analysis to higher 
dimensions. 



3.3. M". In the subsections that follow, we will provide multivariate ex- 
tensions for both sources of compactness: growth in the logarithmic deriva- 
tive of the density q and growth in the derivative penalty S. For simplicity, 
we will concentrate in the case where the state space is all of M". 

3.3.1. Cores. The compactness Condition 12. II involves the domain of the 
form / which is often rather complicated to describe. For this reason, we 
will focus on cases where this domain can be well approximated by smooth 
functions. The adequate notion of approximation is that of a core: 
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Definition 3.1. A family of functions Co C ^?(/) is a core of f if for 

any (pQ in the domain T>{f), there exists a sequence {4>j} in Co such that 

lim < - (j)o, - (t}Q> +f{(t}j - (po, <Pj - 4>o) = 0. 
Condition 3.1. Cj^ is a core of f. 

Let / denote the minimal extension, the smaUest closed extension of the 
form fo defined in equation (j2.ip . Condition 13.11 is equivalent to / = /. 

Although their purpose was different, Fukushima et al. [3] provide a 
convenient sufficient condition that imphes Condition 13.11 in environments 
that interest us. Define: 

.(r) = / x'E(rx)x,(rx)ci5(x) 

J\x\=l 

where dS is the measure (surface element) used for integration on the sphere 
l^l = 1. For functions ip and (j) in C]^ that are radially symmetric, i.e. 
4>{x) = ^(|x|) and ■il){x) = Cd^^l), we may depict the form fo as an integral 
over radii: 

fo{'4'A)= 7 ; r r" Mr. 
Jo dr dr 

Proposition 3.2. Condition \3.1\ is implied by: 
(3.3) / K(r)-iri-"dr = oo. 



Restriction (j3.3p implies the scalar restriction (j3.ip of Proposition 13. li This 
follows since for any non- negative reals ri and r2, 

,111 1 
mm < — , — > > 



ri r2 J ri + r2 

Notice that (j3.3p is a joint restriction on S and q. We may relate this 
condition to the moments of q and the growth of S using the inequahty: 



oo 



/oo 1 \ 2 /.Qo POO 

-dr) < J K(r)-iri-"dr J K{r)r'^-^dr. 



Thus a sufficient condition for (13.31) is that 



(3.4) / ^r"-idr < oo. 
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This latter inequality displays a tradeoff between growth in the penaliza- 
tion matrix and moments of the distribution. Define 



?^(r) = sup x'T,{rx)x, 

\x\=l 



and 



q{^) = / q{rx)dS{x). 

J\x\=l 



Notice that 

K(r) < ?'(r)£.(r). 

Suppose for instance, ?^(r) is dominated by a quadratic function (in r). Then 
(j3.4p and hence (j3.3p are satisfied because the density q is integrable: 

/ Q{r)r"~^dr = 1. 
Jo 

We may extend the previous argument by supposing instead that 

<^2(r) <c|f+2^ 

for some positive 6. Then 

^ < cr^-^ / q{rx)dS{x). 

r J\x\=l 

Thus (|3.4|) is satisfied provided that 

{xl"^^ q{x)dx < oo. 



Hence we can allow for faster growth in if q has high enough moments. 

So far we have produced a sufficient condition for approximation using 
functions in (Condition I3.1|) . We provide sufficient conditions for the 
original compactness condition (Condition 12. ip by transforming the proba- 
bility measure. 

3.3.2. Transforming the Measure. In this subsection we map the original 
probability space into a Lebesgue counterpart L?'{leb). The transforma- 
tion is standard (see Davies [3]), but it is often applied in the reverse 
direction. By using this transformation we may appeal to some existing 
mathematical results on compactness to establish Criterion 12. H 

is precompact in for some 9 > 0. 
Given q write: 

q^^"^ = exp(— /i). 



NONLINEAR PRINCIPAL COMPONENTS AND THE LONG RUN 13 

Assumption 3.1. The function h is twice continuously differentiable. 

This assumption imposes some extra smoothness on the density, that was 
not required in our previous analysis. 

Map the space into (leb) by the (invertible) unitary transformation: 

ip = U(j) = exp{—h)(l). 

Since U is unitary, it suffices to show that Uilie) is pre-compact. Wc will 
actually construct a set that contains UiJA^) and is pre-compact in L?'{leh). 

First notice that U and leave C"^ invariant, and for any ^ G the 
corresponding ^ = U~^il) satisfies: 

V0 = exp(/i)(V'V/i + VV'). 

Thus 

+^ j (v/i)'s(v/i)vV'* 

Applying integration-by-parts to G C^^, it follows that 
Therefore, 

(3.5) /([/- V, u-'r) = \j (vv)'s(vv'*) + \j v^r 

where the potential function V is given by: 

Proposition 3.3. Suppose that C\ is a core for f , tp = U(f) for some 
(f) & H and V is bounded from below. Then ip is weakly differentiable, 

Vip = exp(-/i)(-^V/i + Vcf)) 

and 

(3.7) ^ J {V(t>)'T,Vct>q =\j {V^pyj:{V^) + \j Vi^\ 
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A consequence of this proposition is that 

Ve = {V' G L^leb) : I (^0 + iy^ V' + ^ / (VV')'S(VV') < 1} D U{Ue), 

and it thus suffices to show that Vg is precompact in L'^{leb) for some 9 > 0. 

We consider two methods for estabhshing that this last property is satis- 
fied. We first focus on the behavior of the potential V used in the quadratic 
form: J {9 + and then we study extensions that exploit growth in the 

derivative penalty matrix E used in the quadratic form: /(VV')'S(V'(/')- 

3.3.3. Divergent Potential. In this section, we use the tail behavior of 
the potential V. To simplify the treatment of the term / (V'0)'S(V'0) in the 
definition of Vg we impose: 

Assumption 3.2. The derivative penalty matrix E > c/ for some c > 0. 

This assumption rules out cases in which the derivative penalty matrix di- 
minishes to zero for arbitrarily large states. 

We also suppose that the potential function diverges at the boundary: 

Criterion 3.1. lim\^\_^^V{x) = +oo. 

Proposition 3.4. Under Assumptions \3.1\ and \3.^ if Criterion HO is 
satisfied, then Condition \2.1\ is satisfied. 

Direct verification of Criterion 13.11 may be difficult because formula (j3.6p 
is a bit complicated. However, we may replace the S by a lower bound. Given 
Assumption 13.21 we can always construct a twice continuously differentiable 
function <j(x) with 

(3.8) T.{x) > <;{xfl > cl, for some c > 0. 

We now show how growth conditions on <j(x) can help in delivering com- 
pactness. 
Let: 

Lfo{<P, 4>*) = \j V<A(x) • Vcl>*{x)<;{xfq{x) 
on the space C\. Then 
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Let / be the minimal extension of fo- If / is the minimal extension of 
fo, when equation (|3.8p holds, the domain of / contains the domain of /. 
Applying Proposition 13.31 to /, it suffices to use 



V{x) 



-trace 



dxidxj 



2\/<;{x) ■ Vh{x) 



+ |V/i(x) 



in place of V in demonstrating compactness. 

Criterion 3.2. Equation 113. 8\) is satisfied and 



lim V{x) = +00. 

|x|— >oo 

To derive some sufficient conditions for this criterion we parameterize ? 



as: 



?(x) = exp[ti(x)] 
Then an alternative formula for V is: 

^d^h{xy 



V{x) 



trace 



+ ?(x)^|V/i(x) - Vv{x)\^ - <i{xyVv{x) ■ Vv{x). 



dxidxj 

An alternative to Criterion 13.21 is: 

Criterion 3.3. Equation ^3. 8\) is satisfied with <;{x) = exp[t;(x)] and, 



a) 
b) 



\Vv{x)\ 



lim , , , 
\x\^oo \Vh{x) 



0; 



lim ?(x)^ 

|a:|— >oo 





'd^h{x)' 


^— trace 


dxidxj 



+ Vh{x) ■ Vh{x) = +00. 



Proposition 3.5. Suppose Assumptions [3ll\ is satisfied. Then Criterion 
3.3\ implies Condition \2.1\ 



Restriction b) of Criterion 13.31 limits the second derivative contribution from 
offsetting that of the squared gradient of h. This criterion is convenient 
to check when h displays polynomial growth, or equivalently when q has 
exponentially thin tails. Even if |V/i| becomes arbitrarily small for large 
the compactness criterion can still be satisfied by having the penalization q 
increase to more than offset this decline. 
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Next we consider a way to exploit further growth in V^;^. This approach 
gives us a way to enhance the potential function, and may be used when 
liminf|,|_.^^^>0. Write 

J ?2v</. • v</. = c y" V0 • v</. + y"(?^ - c)V(^ • V(p. 

We now deduce a convenient lower bound on: 



following an approach of Davies [ij] (see Theorem 1.5.12). Construct an 
additional potential function: 



W{x) = + c){Vv • Vv) + - c) trace ^ 



dxidxj 



Lemma 3.1. If equation \3. 8\) holds, then: 

W<p^< f{^^-c)V4>-V4> for all (/) G Ci. 



Note that 



V{x) + W{x) = ?(x)^trace 



d'^v{x) d'^h{x) 



dxidxj dxidxj 



+c 



Vv{x) ■ Vv{x) — trace 



+ ci{xf\Vh{x)-Vv{x)\^ 
d'^v{x) 



dxidxj 



Criterion 3.4. Equation \3. <S'p is satisfied for (;{x) = exp[t;(x)] and, 



a) 



b) 



lim 

|a'|-^oo 



Vv(x) ■ Vv(x) — trace 



d'^v{x) 
dxidxj 



lim ?(x)^trace 

>oo 



d'^v{x) d'^h{x) 
dxidxj dxidx 



+ <i{xf\Vh{x) - Vt>(x)p = +00. 



Proposition 3.6. Suppose A s sumptions \ 3. 1\ and Condition \3.1\ are sat- 
isfied. Then Criterion \3.4\ implies Condition \2.1[ 
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Restriction a) of Criterion 13.41 limits the tail growth of the penalization. 
There are two reasons that such growth should be limited. The fast growth 
in E limits the functions that we hope to approximate using NPCs. Also 
for to be a core for the form / we require limits on growth in S (see 
subsection 13. 3.11 ) 

Our use of W in addition to V in effect replaces — ?2|Vf p with a second 
derivative term: 



trace 



dxidxj 



The following example illustrates the advantage of this replacement. 



Example 3.1. Let 

P c 
v{x) = -log(l + \x\^) + - 

where c = logc. Thus grows like \x\^ in the tails. Simple calculations result 
in 

-Vv{x) ■ Vv{x) = -0^ 



\x\ 



and 



trace 



d'^v{x) 
dxidxj 



(l + |x|2)2' 

n + (n — 2)|x|2 
(1 + \x 



2^2 



Notice that both terms converge to zero as\x\ gets large, hut that the squared 
gradient scaled by <j2 becomes arbitrarily large when f3 > 1. The first term is 
always negative, but the second one is nonnegative provided that n>2. Even 
when n = 1 the second term is larger than the first provided that /? > 10 
This example illustrates when Criterion \3.4\ is preferred to Criterion \3.3[ 
The distinction can be important when densities have algebraic tails. 



This section contains our main existence results, which we now summa- 
rize. We provided two criteria for constructing penalization functions that 
support the existence of countable many NPCs. The first one, Criterion 13.31 
gives the most flexibility in terms of the penalization matrix S; but it is 
applicable for densities that have relatively thin tails. Densities with alge- 
braic tails are precluded. The second one, Criterion 13.41 allows for densities 
with algebraic tails but requires that the penalization be more severe in the 
extremes to compensate for the tail thickness. Making the penalization more 



^We have previously established an alternative compactness criterion for n — 1 that 
does not involve second derivatives that may be preferred to Criterion 13.41 
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potent limits the class of functions that are approximated. Moreover, when 
the penalization is too extreme, we encounter an additional approximation 
problem: the family of functions C]^ ceases to be a core for the form used 
in the NPCs extraction. 

4. Forms and Markov processes. So far we considered the role of 
the penalization matrix S in the construction and approximation proper- 
ties of NPCs. We now use stationary Markov diffusions to give an explicit 
interpretation of this penalization matrix. 

We proceed as follows. Suppose the data {xi}f^i are generated by a 
Markov diffusion by sampling say at integer points in time. Specifically, 
{xt : t > 0} solves 

dxt = fi{xt)dt + A{xt)dBt 

for some n-dimensional vector function fj, and some n by n matrix function 
A of the Markov state with appropriate boundary restrictions, where {Bt : 
t > 0} is an n-dimensional, standard Brownian motion. Suppose further that 
this process has q as its stationary density and that E = AA'. We will have 
more to say in Section [6] about the restrictions on fi that are implicit in such 
a construction. Let (p be in C^. Then it follows from Ito's lemma that the 
local variance of the process {(j){xt)} is 

(V0)'S(V0) 

which is state dependent. Note that /(</>, (p) is the average of this local vari- 
ance. The local variance is measure of magnitude of the instantaneous fore- 
cast error in forecasting {(p{xt)} over the next instant given the current 
Markov state. 

The NFC extraction given by Definition 12 . 1 1 can be performed equivalently 

as: 

Definition 4.1. The function ipj is the j^^ nonlinear principal compo- 
nent (NPC) for j > if ipj solves: 

min /(</>, (A) 

ct>&H 

subject to 



<(f),(j)> = 1, 
<^ps,(p> = 0,s = 0,...,j - 1. 
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Thus the NPCs are extracted by making the local forecast error (appropri- 
ately scaled) small for functions with unit second moments plus orthogo- 
nality. It is a continuous-time counterpart to the (1 — ii'^) in a forecasting 
regression. Recall that the NPCs satisfy < ipjjipk >= and f{ipj,'4'k) = 
for j 7^ k. These properties are nonlinear counterparts to the canonical com- 
ponents in the extraction of Box and Tiao 01 • Box and Tiao [7] show that 
their canonical analysis produces k component series that i) are ordered 
from least predictable to most predictable, ii) are contemporaneously un- 
correlated, and iii) have contemporaneously uncorrelated forecast errors. In 
verifying our counterpart to the third property, notice that in continuous 
time the unpredictable component is Vil)j{xt)h.{xt)dBt, and thus /{ipjjtpk) 
is the (average) local covariance of ipj{xt) and ipk{xt)- 

For financial and economics applications it is important to allow for bar- 
riers that are not attracting, and it is desirable to allow for a non-compact 
state space of the Markov process. Thus imposing uniform bounds on both q 
and the matrix S over compact state spaces is too restrictive. Our existence 
results in Section [3] avoid such restrictions. 

Our construction of NPCs supports the estimation and testing of multi- 
variate Markov diffusion models. There are other functional principal com- 
ponents constructions. For instance, Dauxois and Nkiet construct non- 
linear principal components for multivariate densities by choosing pairs of 
functions that maximize cross correlations without penalizing derivatives. 
Zhou and He [s^] propose L^-norm constrained principal components for 
the purpose of dimension reduction and variable filtering. Ramsay and Sil- 
verman [2^ provide detailed discussions on functional principal component 
analysis for IID realization of curves. 

5. Reversible diffusions. We next consider how to use the form / 
to build a Markov process. Specifically associated with the form /, there 
is a second-order differential operator F that generates the semigroup of a 
Markov diffusion. The diffusion process has S as its local covariance matrix 
and q as it stationary density. The construction of F is unique provided that 
we restrict the process to be time reversible. 

5.1. A differential operator. There is a differential operator Fq that is as- 
sociated with the form fo (given in l2.ip . which we construct using integration- 
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by-parts. For any functions and ijj in C\: 

1 /■„ , 1 f^ 8{<,<Tij) 84, , 



where the second equahty of (jS.ip follows from the integration-by-parts for- 
mula: 

^ dyi dyj J ^ dyidyj ^ Xj 



We use (jS.ip to motivate our interest in the differential operator Fq'- 
(5.2) F„</) = - - )^ ^^^ TwjTT - ^ 



2^ %<9yj 2g^ % 9yj 
This operator is constructed so that the form fo can be represented as: 

/o((^,V') =< Fo(i),i) >=< (i),Foi) >, 

where the second relation holds because we can interchange the role of (p 
and il) in ()5.ip . Notice from ()5.2p that operator has both a first derivative 
term and a second derivative term. Symmetry (with respect to q) is built 
into the construction of this operator because of its link to the symmetric 
form fo- 

We are interested in the operator Fq because of its use in modeling Markov 
diffusions. Suppose that {xt : t > 0} solves the stochastic differential equa- 
tion: 

dxt = fi{xt)dt + A{xt)dBt 

with appropriate boundary restrictions, where {Bt : t > 0} is an n-dimensional, 
standard Brownian motion, and: 

Set 

S = AA'. 

Then we may use Ito's Lemma to show that for each G C]^ 

E [(l){xt)\xo = x] - (f>{x) 
-Fo4> = lim , 

m t 
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where this hmit is taken with respect to the L^. That is, —Fq coincides with 
the infinitesimal generator of {xt} in C^. We use this link to the stochastic 
differential equation to motivate our use of the matrix S for penalizing 
derivatives. This matrix will also be the diffusion matrix for a continuous- 
time Markov process with stationary density q. 

5.2. Generating reversible diffusions. Wong [sS*] constructed scalar dif- 
fusion models with stationary densities in the Pearson class and produced 
a spectral or eigenfunction decomposition of the associated one-parameter 
semigroup of conditional expectation operators. Banon [3] and Cobb et al. 
fld \ extended this analysis in part by taking as given the diffusion coefficient 
and constructing the implied drift coefficient for the stochastic differential 
equation that generates a prescribed stationary density. Banon [3] and Cobb 
et al. [13] did not analyze the implied spectral decomposition of the associ- 
ated conditional expectation operators. In all these analyses, the stationary 
density of the diffusion process is taken as one of the starting points of a 
model builder. In this section we share Banon Qj's aim for generality, but 
at the same time we retain Wong_[33('s interest in spectral decompositions. 

As in Banon 0], Cobb et al. [id], Wong (33| . we parameterize diffusion 
processes using the stationary density q and a (possibly state dependent) 
diffusion coefficient E in contrast to the more typical approach of starting 
with a drift and the diffusion coefficients. In contrast to Banon [3], Cobb 
et al. [10(, Wong [3^, we allow the diffusion process to be multivariate on 
a state space O. For this to result in a unique diffusion, we require that the 
diffusion be time reversible. 

A stochastic process is time reversible if its forward and backward tran- 
sition probabilities are the same. Multivariate reversible diffusions can be 
parameterized directly by the pair {q, S). Associated with the closed exten- 
sion / is a family of resolvent operators Ga indexed by a positive parameter 
a. We use the resolvent operators to build a semigroup of conditional ex- 
pectation operators for a Markov process, and in particular, the generator 
of that semigroup. 

For any a > 0, the resolvent operator Ga is constructed as follows. Given 
a function 4> ^ L?, define Gq,(/) G ^(/) to be the solution to 

(5.3) f{Ga(t>,'4^)+a<Ga(t>,tp>=<^,i^> 

for all tjj G T){f). The Riesz Representation Theorem guarantees the ex- 
istence of the Ga(p- This family of resolvent operators is known to satisfy 
several convenient restrictions {e.g. see Fukushima et al. [l8| pages 15 and 
19). In particular, Ga is a one-to-one mapping from into Ga{L'^). 
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We associate with the form / the self-adjoint, positive semidefinite oper- 
ator: 

(5.4) = (G„)~ V - "(A 

defined on the domain Gq(-L^). It can be shown that F is independent of 
a. Since the operator F is self-adjoint and positive semidefinite, we may 
define a unique positive semidefinite square root ^/F. While F may only 
be defined on a reduced domain, the domain of its square root may be 
extended uniquely to the entire space !'(/) and: f{4>,ip) =< V^V' > 



{e.g. see Fukushima et al. |l8l | Theorem 1.3.1). Moreover, it is an extension 
of the operator Fq because / is an extension of fo {e.g. see Lemma 3.3.1 of 
Fukushima et al. [18]). 

We also use the family of resolvent operators to build a semigroup of 
conditional expectation operators. A natural candidate for this semigroup is 
{exp(— ti^) : t > 0}. Formally, the expression exp{—tF) is not well defined 
as a series expansion. However, for any a and any t, we may form the 
exponential: 

exp(to^GQ, — atl) 
as a Neumann series expansion. Notice that (15. 4p implies 

ta'^Ga - tal = ta[{I + -F)-^ - I] = -tF (l + -F 

a V o . 

Instead directly using a series expansion, we use the limit 
lim ex.pUta'^Ga) — atl] = exp(— tF) 

a — >oo 

often referred to as Yosida approximation to construct formally a strongly 
continuous, semigroup of operators indexed by t > 0. 

We have just seen how to construct resolvent operators and the semigroup 
of conditional expectation operators from the form. We may invert this latter 
relation and obtain: 

/■oo 

(5.5) Ga4> = / exp(— at) ex.p{—tF)(l)dt 

Jo 

which is the usual formula for the resolvents of a semigroup of operators. 
The operator —F is referred to as the generator of both the semigroup 
{exp{—tF) : t > 0} and of the family of resolvent operators {Ga ■ a > 0}. 

As we have just seen, associated with a closed form /, there is an operator 
F and a (strongly continuous) semigroup {exp(-tF) : t > 0} on L^. To 
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establish that there is a Markov process associated with this semigroup, 
we need first to verify that the semigroup satisfies two properties. First we 
require, for each t > and each < < 1 in L^, < exp(— < 1. 
A semigroup satisfying this property is cahed submarkov in the language of 
Beurling and Deny [5]. Second we require, for each t > 0, exp(—tF)l = 1. 
A semigroup satisfying this property is said to conserve probabilities. We 
refer to a submarkov semigroup that conserves probabilities as a Markov 
semigroup. Finally we must make sure that the Markov semigroup is actually 
the family of conditional expectation operators of a Markov process. 

The following condition is sufficient for a closed form to generate a sub- 



markov semigroup {e.g., see Davies ij] section 1.3) 



Condition 5.1. (Beurling-Deny) For any (p G ^^(/); tp given by the 
truncation: 

V; = (0 V (/>) A 1 

is in T){f) and 

When this condition is satisfied, the semigroup exp(— tF) is submarkov, and 
for each t > 0, exp(— tF) is an LP' contraction (|| exp(— iF)(^||2 < ||</>||2)- This 
contraction property is also satisfied for the norm for 1 < p < oo (Davies 
14| Theorem 1.3.3). In particular, we may extend the semigroup from 



to while preserving the contraction property. 



Proposition 5.1. There exists a self adjoint operator F associated with 
f , which is an extension of Fq and generates a semigroup {exp{—tF) : t > 
0}. The density q is the stationary density for this diffusion, the matrix S 
is the diffusion matrix and exp{—tF) is the conditional expectation operator 
over an interval of time t. 

5.3. Nonlinear principal components and eigenf unctions. Continuous time 
Markov process models are typically specified in terms of their local dynam- 
ics. Given the nonlinearity in the state variables, it is a nontrivial task to 
infer the global dynamics, and in particular the long-run behavior from this 
local specification. Characterizing eigenfunctions of conditional expectation 
operators offer a way of approximating intermediate and long term dynam- 
ics in ways that are typically disguised from the local dynamics in nonlinear 
settings. 

Eigenfunctions of the closed form / will also be eigenfunctions of the 
resolvent operators Ga and of the generator F. For convenience, we rewrite 
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equation (|5.3p : 

f{Ga(t>, V') + a < Ga(f>, ip >=< 0, V' > • 

From this formula, we may verify that / and Ga must share eigenfunctions 
for any a > 0. The eigenvalues are related via the formula: 

+ a 

where A is the eigenvalue of Ga and 5 is the corresponding eigenvalue of /. 
Given the relation between the generator F and the resolvent operator 

Ga, 

F(j)= {Ga)~^(j)- ac^, 

these two operators must share eigenfunctions. Moreover, eigenfunctions of 
the operators F, Ga and the form / must belong to the domain of F or 
equivalently to the image of Ga- This domain is contained in the domain of 
the form /. Similarly, we may show that if (p is an eigenfunction of the form 
/ with eigenvalue 6, then (p is an eigenfunction of exp{—tF) with eigenvalue 
exp{—t6) for any positive t. 

An eigenfunction ip of the generator F satisfies: 

(5.6) E['ijj{xt+s)\xt] = ex.p{-6s)ip{xt), 

for some positive number 6 and each transition interval s. Thus the NPCs 
described previously will also satisfy the testable conditional moment re- 
strictions (j5.6p . The scalar process {ip{xt)} should behave as a scalar au- 
toregression with autoregressive coefficient exp{—6s) for sample interval s. 
The forecast error: ip{xt+s) — exp{—6s)'ip{xt) will typically be conditionally 
heteroskedastic (have conditional variance that depends on the Markov state 

Xt)- 

Since the form / can be depicted using a principal component decomposi- 
tion as in (|2.3p . analogous decompositions are applicable to F and exp{—tF): 

Fcp 

exp{-tF)(j) 

where the first expansion is only a valid series when (p is in the domain of 
the operator F. When the eigenvalues 6j of the form increase rapidly (in j), 
the term exp(—t5j) will decline to zero rapidly (in j), more so when the time 



^exp(-t^,) ^^^^^^^ ^„ 
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horizon t becomes large. As a consequence, it becomes easier to approximate 
the conditional expectation operator over a finite transition interval t with 
a smaller number of NPCs. On the other hand, slow eigenvalue divergence 
of the form will make it challenging to approximate the transition operators 
with a small number of NPCs. Our results in Section [3] give primitive condi- 
tions based on the tail behaviors of stationary density and diffusion matrix 
for the existence of the above eigenfunction decompositions. In an earlier 
longer version of our paper we provided primitive sufficient conditions for 
the speed of eigenvalue decays. 

5.4. An alternative form. In this subsection we construct a second quadratic 
form used to depict the long-run variance of a stochastic processes con- 
structed from the Markov process {xt}. 

This quadratic form is defined to be the limit 

g{^,ip) = 21im < Ga^,ip > 

and is well defined on a subspace S{F) of functions in for which 

lim < Ga(p, 4> >< oo. 

While the form / is used to define the operator F, the form g may be used to 
define as is evident from formulas (j5.3p or ()5.4p . The forms / and g share 
eigenf unctions. The g eigenvalues are the reciprocals of the / eigenvalues. 
In light of equation (|5.5p 



(5.7) <Ga4>,ip>= exp{-at)E[(l){xt)tp{xo)]dt. 

Jo 

Hence, using ()5.4p . we obtain: 



g{(p, ip) = lim 2 < G^^, V' >= hm 2 < (a/ + F) ^(j),'il)> . 
«j,o «io 

Notice that this form is symmetric because the resolvent operator is self- 
adjoint for any positive a. Using (15. 7p we may write this form as 



r' + CXD 

g{cP,ij)= I E[(j){xt)^ixo)]dt= I E[ij{xt)(l)ixo)]dt. 



oo 

Proposition 5.2. The j*^ nonlinear principal component if^j for j > 1 
solves: 

max aid), 6) 
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subject to 



<(t>,(t>> = 1, 

<iJs,(t>> = o,s = o,...,j 



where ipQ is initialized to be the constant function one. 

Recall that the spectral density function at frequency 6 for a stochastic 
process {(f){xt)} is defined to be: 



whenever this integral is well defined. In particular g{(j), cj)) is the spectral 
density of the process {(j){xt)} at frequency zero, a well known measure of 
the long-run variance. 

For an alternative but closely related defense of the term long-run vari- 
ance, suppose that = F-^ for some '0 in the domain of F. Then, 



is a martingale adapted to the Markov filtration. Following Bhattacharya [g] 
and Hansen and Scheinkman [2]]], we may use this martingale construction 
to justify: 



sample length T goes to infinity. 

This gives us an alternative interpretation of our NPCs. We may base the 
extraction on maximizing gf(0, 0) subject to < 0, >= 1 and orthogonality 
constraint. In words we are maximizing long-run variation while constraining 
the overall variation. Smooth functions of a Markov state are also highly 
persistent and as a consequence maximize long-run variation. 

6. Irreversible diffusions. The stationary Markov construction we 
used in the previous section resulted in a generator that was self adjoint and 
hence a process that was time reversible. Even among the class of station- 
ary Markov diffusions, reversibility is special when the process has multiple 
dimensions. Given a stationary density q and a diffusion matrix S, we have 
seen how to construct a reversible diffusion, but typically there are other dif- 
fusions that share the same density and diffusion matrix, but not reversible. 
We now characterize the drifts of such processes. 




—00 





Thus g{(f), (j)) is the limiting variance for the process {-^ /g^ (t>{xs)ds} as the 
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Instead of constructing a Markov process implied by a form, suppose 
we have specified the process as a semigroup of conditional expectation 
operators indexed by the transition interval. We suppose this process has 
stationary density q. Following Nelson [25] and Hansen and Scheinkman [2l| 
we study the semigroup of conditional expectation operators on the space 
LP'. This semigroup has a generator A defined on a dense subspace of L^. 
Consistent with our construction of F, on the subspace of C^, we suppose 
that A can be represented as a second-order differential operator: 



and that 



It may be shown that 



j A(l)q = 0. 



i;{AcP)q = Ucl),^P) on C],. 

This construction does not require that A = —F or that A be self adjoint. 
How can the adjoint be represented? The adjoint must satisfy: 

implying that the F constructed previously must satisfy: F = —{A + A*)/2. 
Moreover, since q is also the stationary density of the reverse time process: 

A*6q = 0. 



It follows from Nelson 25[ that the adjoint operator has the same diffusion 
matrix, but a different drift vector. The drift for the adjoint operator A* is 
given by: 

1 ^ d{qaij) d(t) 



H =-fi 



q dyi dyj ' 



The adjoint operator generates the semigroup of expectation operators for 
the reverse time diffusion. From the formula for reverse time drift, fi* , it 
follows that 

+ ^* _ J_ ^ d{qaij) d(j) 

which is the negative of the second term in representation (|5.2p for Fq. Thus 
if the generator A of the semigroup is not self adjoint, then the operator 
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F implied by the form is a second order differential operator built using a 
simple average of the forward and reverse time drift coefficients, fx and /x*, 
and the common diffusion matrix, S. 

Remark 6.1. The density q and the diffusion matrix S do place other 
restrictions on the drift vector fi. Since q is the stationary density, /x and fi* 
must also satisfy: 



While there is typically one solution fi (or fi* ) to this equation for the scalar 
case, multiple solutions will exist for the multivariate case. That is, unless 
reversibility is imposed a priori, the drift cannot be identified from the density 
and diffusion matrix; but the average of the forward and backward drift can 
be inferred. 

Remark 6.2. The NPCs existence results of section\^have an immedi- 
ate extension to the existence of eigenf unctions of the semigroup of condi- 
tional expectation operators when the Markov diffusion is not reversible. For 
a semigroup with generator A we may "invert" equation \5.4\ to construct a 
family of resolvent operators: 



and a form f((p,tp) =< (j),Atp >, which is not necessarily symmetric. While 
the generator is an unbounded operator on L^, the resolvent operators are 
bounded. When the resolvents are compact operators, they have well defined 
eigenf unctions and eigenvalues, but they may be complex valued. (See Rudin 
fmj, Theorem 4.25, page 108.) 

Given a the resolvent operator will be compact provided that the image of 
Ra of the unit ball has compact closure. Consider a function given by 





(qI-^)-V, 



if= {aI-A)-^(^. 



Then (j) G T){A) and 



<(l),(j)>=a^<if,(p> -2a < '.p,Aip > + < Aip, Aif >>a^<if,ip> +2af{(p, (p). 
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Thus it suffices to show that 



{ip G V{A) ■a^<^p,ip> +2af{ip, f) < 1} 



has compact closure. This set will have compact closure, if, and only if, 
compactness Condition \2.1\ is satisfied for 9 = a/2. 

7. Extraction of NPCs from data. In this section we suggest a sieve 
metiiod to estimate NPCs based on discretely-sampled data from a Markov 
process. Here we only sketch the construction and leave the formal justifi- 
cation and detailed analysis of rates of convergence for subsequent research. 

Let {xi}J^i be a discrete-time sample of the underlying continuous-time, 
ergodic Markov diffusion {xt : t > 0} on the state space Suppose 
that the penalization matrix S is either known or could be consistently esti- 
mated. The invariant probability measure Q is unknown but is consistently 
estimated by the empirical distribution of the data {xi}J^i. 

Let {Hm '■ m = 1,2,...} be a sequence of increasing finite-dimensional 
linear (sieve) spaces that approximate the Hilbert space H (the domain of 
the form /) as m goes to infinity. For notational convenience, let m be the 
dimension of Hm, and suppose that m goes to infinity slowly as the sample 
size T goes to infinity. One strategy is to extract the finite sample approxi- 
mations sequentially as in optimization problem given in Definition 12.11 For 
a finite-dimensional sieve approximation, it suffices to solve a generalized 
eigenvector problem. 

Since the space Hm is finite-dimensional. 



where the basis functions {Bk{x) : A; > 1} are used to construct the sieve. 
For example, {Bk{x) : A; > 1} could be one of the following: (i) thin-plate 
splines, radial-basis wavelets or tensor-product wavelets if q has algebraic 
tails on its support 0, = R"; (ii) tensor-product Hermite polynomial basis or 
Gaussian radial basis if q has exponential thin tails on its support Q = R". 
Form a vector ^m of functions of x by stacking terms Bk{x), k = 1, m, 

^If available a continuous-time record could be used, but statistical approximation 
remains an issue because the length of the record is finite. 
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and form two matrices: 

= ^f:[V^T(x.)]S(xi)[V^T(xt)]% 

i=l 
1 ^ 

i=l 

where * denotes transpose. Both matrices are symmetric and positive semidef- 
inite by construction. The matrix is typically nonsingular while Vt is 
singular when a constant function is in the sieve space Hm- Stack the co- 
efficients on the sieve basis into a vector ax- The sample counterpart to 
the NPC (or eigenfunction) problem is the following generalized eigenvector 
problem: 



(7.1) VyOT = St^to-t 

where aj- is a generalized eigenvector and 6t a generalized eigenvalue. Since 
Wt is positive definite, we may apply the Cholesky decomposition to trans- 
form this generalized eigenvector problem into a standard eigenvector prob- 
lem. 

Associated with each eigenvector solution to (j7.ip . is an eigenfunction 
formed by multiplying the coefficient entries of the eigenvector by the sieve 
basis functions. We have constructed this sample problem so that one of the 
approximating eigenfunctions will be constant whenever there is a nonzero 
constant function in Hm, and the associated eigenvalue is zeroU 



8. Conclusions and related literature. We have studied NPCs from 
multiple vantage points. We have explored their role in capturing variation 
subject to smoothness constraints and their role in capturing long-run vari- 
ation in time series modeling. We have also considered their use in approxi- 
mation where the smoothness constraints limit the family of functions to be 
approximated. 

We also used multivariate Markov diffusions as data generating devices to 
interpret our NPCs. These NPCs are eigenfunctions of conditional expecta- 
tion operators when the Markov process is reversible and hence imply con- 
ditional moment restrictions. Our analysis expands on the result of Hansen 

For reversible difFusions we could instead approximate the NPCs nonparametrically 
by maximizing autocorrelations. For scalar diffusion models, this method has been already 
considered in Chen et al. and Gobet et al. [l^. Both papers used a wavelet sieve as 
the approximating space. 
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and Scheinkman [2]| that reversible diffusions can be identified nonpara- 
metrically from discrete-time low frequency stationary observations. 

For more general diffusions, these NPC are orthogonal and have orthog- 
onal innovations analogous to those from the canonical analysis of Box and 
Tiao [3] for linear multiple time series models. (See also Pan and Yao [26].) 
Thus our NPC construction provides a convenient way to summarize impli- 
cations of multivariate nonlinear diffusion models. Given the nonlinearity in 
the state variables, it is a nontrivial task to infer the global dynamics, and 
in particular the long-run behavior from this local specification based on low 
frequency data. Our characterization of NPCs offers a way to characterize 
features of the implied time series that are typically disguised from the local 
dynamics. While we featured diffusion processes, more general processes in- 
cluding processes with jumps can be accommodated by expanding the types 
of forms that are considered. 

The idea of using eigenfunctions of conditional expectation operators for 
estimation and testing of Markov processes based on low frequency data has 
been suggested previously by Demoura ISj, Hansen and Scheinkman [2l|l, 



Kessler and Sorensen 2j], Hansen et al. 24], Florens et al. 17 1, Chen et al. 



2j] use eigenfunc- 



and Gobet et al. [3]. In particular, Kessler and Sorensen 
tions to construct quasi-optimal estimators of parametric scalar diffusion 
models of the drift and diffusion coefficients from discrete-time data in the 
special case in which the functional forms of eigenfunctions are known a pri- 
ori. Hansen and Scheinkman 21], Hansen et al. Chen et al. [1], Gobet 
et al. [1^ and Darolles et al. 1V\ study semiparametric and nonparametric 
identification and over-identification based on an eigenfunction extraction 
that is closely related to the one analyzed here; see Fan [l^ for a recent 
review. This previous literature focuses primarily on scalar diffusion models 
and in some cases to scalar diffusions on compact state spaces with reflec- 
tive boundaries. Our analysis of Markov diffusions extends to multivariate 
settings applicable to processes without attracting barriers. 

In this paper we have characterized a particular type of functional princi- 
pal components motivated in part by long-run implications of multivariate 
Markov diffusions. This is a natural first step. Inferential issues, while cru- 
cial, are beyond the scope of this paper. Formalizing statistical comparisons 
of models and data in a multivariate setting is an obvious next step, sup- 
ported by either parametric, semiparametric or nonparametric estimation. 
There are a number of recent statistical results on estimation and inference 
of functional principal components of covariance operators associated with 
i.i.d. or longitudinal sample of curves. See e.o^ Silverman 84], Ramsay and 



Silverman [28|], Hall et al. [20[], Benko et al. \4\ and Zhou et al. 35|]. These 
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existing results can in principle be modified to establish asymptotic proper- 
ties of our estimated NPCs from discrete-time low frequency realizations of 
an underlying multivariate Markov diffusion model. 
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APPENDIX A: PROOFS 

Proof of Claim I2.2[ In solving the maximization component of the 
problem, first limit the (j)^s to be in but orthogonal to H. This can only 
reduce maximized value. The space of such (/)'s contains more than just the 
zero element because has 1 dimensions. Write <j) as: (j) = J2jLo 
Since Proj{(f)\H) = 0, the objective can be expressed as: J2f=oi''j)'^'^j- The 
constraint set implies that 

N 

E(^.)' < 1 

i=o 

because f{^pj,^pi) =< tpjjtpi >= for j ^ £. While the coefficients rj cannot 
be freely chosen (0 must be orthogonal to H), they can be scaled so that the 
constraint is satisfied with equality. Since the sequence of Xj 's is decreasing, 
the maximized objective must be no less than Atv- D 

Proof of Claim I2.3[ Write (p as: 4> = Proj{(j)\H]\f^i) + ip where cp is in 
H^_^. Write: 

N-l 

Proj{<l)\HN-i) = '^j'^j 
j=0 

Using this decomposition, the objective can be written as: < if,ip >, and 
the constraint set can be written as: 

Af-l 

5] (r^f + < ^ > +0/(99, V?) < 1, 

j=0 

because tpi,ip2, ■■■,'^N-i,f are orthogonal, and f{ipj,if) = f{ipj,ipi) = for 
j = 0, — 1 and i = j + 1, j + 2, ...,N — 1. To maximize the objective, 
the coefficients r^'s are set to zero and ip is chosen by solving Problem 12.11 
for H = H^_-^^. The conclusion follows. □ 



Proof of Proposition 13. 1L Hansen et al. [22] consider densities from 
stationary scalar diffusions, whose boundaries are not attracting. This propo- 
sition gives an equivalent statement of their compactness condition, written 
in terms of the stationary density. The scalar diffusion coefficient in their 
analysis is □ 

To show that the form / is closed extension of /o, we verify that H is a 
Hilbert space. 



Proposition A.l. H is a Hilbert space. 
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Proof of Proposition lA.li Let A be the symmetric square root of 
the penalty matrix S. If {(pj} is a Cauchy sequence in H, then and the 
entries of {AVcpj} form Cauchy sequences in L^. Denote the hmits in as 

(f) = hm 

V = hm AV0j. 

j-*oo 

For each u G Cj^ we know that: 




where ^ is the partial derivative of u with respect to x. Since S is positive 
definite and continuous on any compact subset of O and u vanishes outside 
any such set, it follows that 

Hence (p £ H with Vcj) = A^^v. Moreover, cpn ^ <j) in H. □ 

We now present a criteria for Condition 13.11 to hold. This result is due 
essentially to Azencott j3| and Davies [13:]. 

Proposition A. 2. Consider a form fo that satisfies the Beurling-Deny 
Criterion ] 5. 1{ Let f denote the minimal extension of fo with domain 
Suppose that 1 G V{f ) and /(I, </>) = for all cp G V{f ). Then f = f. 

Proof of Proposition IA.2[ As explained in Section^ associated with 
the forms / and / we may construct operators F and F and resolvents G 
and G. Integration by parts can be used to show that the operators F and 
F are extensions of the differential operator 




defined on Cj^. The form / and hence the form / satisfies the Beurling- 
Deny Criterion 15.11 (Davies [14] Theorem 1.3.5). Hence as stated in Davies 
[l^ the operators F and F can be extended to subspaces of L^. Similarly 
the resolvents G and G can be extended to L^. We will denote the extended 
operators as F^, F^, G^, and G^. Since q is integrable, convergence implies 
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convergence and consequently F and F are restrictions of F^ and F^, 
respectively. Similarly for the resolvent operators. 

If f{l,(p) = for all (j) G V{f) then Fl = and Gl = 1. Consequently 
G^l = 1. It follows from Theorem 2.2 in Davies [13|] that is a core for 
F^, in the sense that F^ is the closure in of Lo Hence is a core for 
F^, and thus a core for / or, equivalently, / and / coincide. □ 

Proof of Proposition 13.21 Since / is the minimal closed extension, 
it has as its core. When this condition is met, a sequence of functions 

in C]^ can be constructed that converge to 1 in and f{(j)j,(j)j) con- 
verges to zero. See Fukushima et al. [l^ Theorem 1.6.6 and Theorem 1.6.7. 
An approximating sequence of functions with compact support is supplied 
by Fukushima et al. [la] in the proof of Theorem 1.6.7. This sequence can 
be smoothed using a suitable regularization to produce a corresponding ap- 
proximating sequence in C]^. Thus the unit function is in the domain of / 
and f{l,(p) = for cp £ Cj^ and hence for (j) £ ^(/)- As we established 
above, this is sufficient for Condition 13.11 □ 

Proof of Proposition 13.31 Since V is bounded from below, we may 
choose a 6 > such that V + is nonnegative. Construct the space: 

and J ipVip = — J gif, for all tp £ C^}. 

As in the proof of Proposition lA.ll it follows that H is a Hilbert space with 
inner product: 

\v + e + i)tpi'+ /(v^)'s(V'0). 



We show that UH C H. 

Since Cj^ is a core for /, there exists a sequence : j = 1,2,...} in C]^ 
that converges to (j) in the Hilbert space norm of H. Hence this sequence 
is Cauchy in that norm. Writing -0^- = U and applying equation (j3.5p we 
obtain: 

1(0, -0,)2(i + % + j{v<Pj-Vci),)'i:{v<Pj-v<p,)q 

{v + e + i)(Vi - i^if + I (vVi - vV'^)'s(vVi - vi^d- 



^Davies assumes that the coefficients of L are C°° . However the proof holds for 
C coefficients since elliptic regularity holds even when the coefficients are only Lipschitz 



(see Theorem 6.3 of Agmon [if) 
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Thus {ipj : j = 1, 2, ...} is Cauchy in the Hilbert space norm of H and the 
hmit point ip must satisfy ip = Ucj). Notice that jviipf + ^ {V ijj)' T.{V ip) 
equals H squared norm minus 9 + 1 times the L?'{Q) squared norm. Thus, 

V{ipf + / (VV')'S(VV') = hm / {V^yT.{V^ljAq 
J i->oo J 

= Hm (V0,)'S(V0,)g 

This proves ()3.7|) . 

For a given = Ucf) our candidate for the weak derivative is, 

g = exp(-/i)(-(/>V/i + V0). 

To verify that g is indeed the weak derivative, we must show that for any 



(A.l) Ji;Vip = -Jg^, 

and 

(A.2) J g'Y.g < oo. 

We check relation (lA.lh by applying integration by parts, 

- j Vipif = - J[eyip{-h){-<j)Vh + V(j))]ip = - J V(j)ex.p{-h)ip + J exp{-h)ip(pVh 
= J (j)[exp{—h)'Vip — 'Vhexp(—h)ip]+ J exp(— /i)(^0V/i = J Tp^ip. 

Inequality fO) follows from ([321) • □ 

Proof of Proposition 13.41 Since V is continuous and diverges at the 
boundaries, it must be bounded from below. Also, it follows from Assump- 
tion O that 

Ve C {V' £ L?'{leh) : V has a weak derivative and 

/ {Q+\y) w' + i/ ivv'p<i}. 

We may then apply the argument in the proof of Theorem XIII. 67 of Reed 
and Simon 29(] to establish that Vq is precompact in Lp'{leh). □ 
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Proof of Lemma 13. 1L Consider a positive function 

1 



and note that 



<?(x)^-c Vxix) = -<i{x)Vv + c- ^ ' 



For in C\ we may apply integration by parts to show that 
's'' - c)Vx • V0 = / ( ^ — — \ i^v • Vv) + ( = ) trace 




dxidxj 



The conclusion follows from Theorem 1.5.12 of Davies iJ]. While Davies [14 1 



uses test functions in C^, the same proof applies to Cj^ test functions. □ 
Proof of Proposition 15.11 The form / satisfies the Beurhng-Deny 



criteria (Davies [14[ Theorem 1.3.5). Thus there exists a self-adjoint op- 
erator F which is an extension of Fq and generates a submarkov semigroup 
exp(— iF). Theorem 7.2.1 of Fukushima et al. [l^ guarantees that there 
exists a Markov process {xt} that has exp(— tF) as its semigroup of condi- 
tional expectations. The semigroup exp(— tF) conserves probability because 
the unit function is in the domain of the form / and f{l,(j)) = for any 
4> G T^{f)- As a consequence, the unit function is also in the domain of the 
operator F, Fl = 0. □ 
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